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Associations between single nucleotide polymorphisms (SNPs) at 5p15 and multiple cancer types have been 
reported. We have previously shown evidence for a strong association between prostate cancer (PrCa) risk 
and rs2242652 at 5p15, intronic in the telomerase reverse transcriptase (TERT) gene that encodes TERT. To 
comprehensively evaluate the association between genetic variation across this region and PrCa, we per- 
formed a fine-mapping analysis by genotyping 134 SNPs using a custom lllumina iSelect array or 
Sequenom MassArray iPlex, followed by imputation of 1094 SNPs in 22 301 PrCa cases and 22 320 controls 
in The PRACTICAL consortium. Multiple stepwise logistic regression analysis identified four signals in the 
promoter or intronic regions of TERTthat independently associated with PrCa risk. Gene expression analysis 
of normal prostate tissue showed evidence that SNPs within one of these regions also associated with TERT 
expression, providing a potential mechanism for predisposition to disease. 
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INTRODUCTION 

We have previously reported an association between prostate 
cancer (PrCa) risk and rs2242652 on 5pl5 (1). rs2242652 
lies in intron 4 of telomerase reverse transcriptase (TERT) 
that encodes TERT, the catalytic subunit of the telomerase 
ribonucleoprotein complex (2). Telomerase catalyzes the de 
novo addition of telomere repeat sequences on to chromosome 
ends and, thereby, counterbalances telomere-dependent repli- 
cative senescence. Several studies have reported an associ- 
ation between shorter telomeres in lymphocytes and 
increased risk of various cancer types (3-5), although evi- 
dence from prospective studies is ambiguous. Associations 
between single nucleotide polymorphisms (SNPs) in the 
TERT region and multiple cancer types have been reported, 
and these have been comprehensively reviewed recently (6- 
8); however, no consistent correlation has been observed 
thus far between the cancer-associated SNPs in TERT and 
either gene expression or telomere length (TL). 

Initial evidence for association with PrCa risk at 5pl5 was 
reported by Rafnar et al. (7) for rs401681 and rs2736098 
CP=3.6xlO" 4 and P = 1.3 x 10" 4 ). Subsequently, we 
found much stronger evidence of association for rs2242652, 
a SNP only weakly correlated with rs401681 and rs2736098 
(r 2 = 0.19 and r 2 = 0.10, respectively, in Hapmap CEU) (1). 
We, therefore, concluded that rs2242652 is more strongly 
associated with variant(s) causally related to PrCa risk. 
rs2242652 is strongly correlated with rs 10069690 (r 2 = 
0.80) that is associated with oestrogen receptor negative (ER 
-ve) breast cancer (9), but is not correlated with SNPs previ- 
ously associated with other cancer types. The TERT locus is 
characterized by low linkage disequilibrium (LD), raising 
the possibility that additional SNPs could be independently 
related to PrCa risk and that these could also differ from 
those predisposing to other cancers. 



RESULTS 

To further elucidate the association of the 5pl5 TERT locus 
with PrCa risk, we have performed a high-resolution fine- 
mapping of SNPs across the region through a combination 
of direct genotyping and imputation. Using a custom Illumina 
iSelect genotyping array (iCOGS) designed for the Collabora- 
tive Oncology Gene-Environment Study (http://ec.europa.eu/ 
research/health/medical-research/cancer/fp7-projects/cogs_en. 
html), we initially genotyped 1 14 SNPs spanning 135 kb of the 
SLC6A18-TERT-CLPTM1L region in lymphocyte extracted 
DNA from 22 301 PrCa cases and 22 320 matched controls. 
These data enabled us to select a narrower 20 kb interval 
(Chr5: 1278590- 1299850, GRCh37/hgl9) within which var- 
iants exhibited substantially stronger associations with PrCa. 
An additional 25 SNPs within this interval were genotyped 
in a subset of 283 1 PrCa cases and 2440 controls by Sequenom 
MassArray iPlex. We imputed all 44 621 samples genotyped 
in the iCOGS PRACTICAL (http://ccge.medschl.cam.ac.uk/ 
consortia/practical) sample set for variants in the 1000 
Genome Phase 1 integrated variant set (March 2012) for the 
interval Chr5:1227693-1361669 using IMPUTE v2.2.2. Con- 
cordance between imputed and genotyped SNPs for the 20 
SNPs in the Sequenom panel that passed quality control 



(QC) was >90% (Materials and Methods and Supplementary 
Material, Fig. SI). Associations between PrCa risk and the 
imputed dataset of 1094 SNPs were assessed using a 1 df 
trend test adjusted for study and six principal components to 
correct for inflation (10). Samples used in the analysis were 
predominantly of European single ancestry, and individuals 
with >15% minority ancestries were excluded (see Materials 
and Methods and summary data of imputation in Supplemen- 
tary Material, Table SI). This analysis identified 44 SNPs 
associated with PrCa risk at P < 10~ (Supplementary Mater- 
ial, Figs SI and S2 and Supplementary Material, Table S2). To 
determine independently associated variants in this region, we 
performed forward and backward stepwise logistic regression 
(LR); SNPs were included in the model, if they were signifi- 
cant at P < 10 4 after adjustment for other SNPs (Table 1 
and Supplementary Material, Table S2). Both regression 
models identified multiple independent associations, reflecting 
the complexity of this region. Across both models, six SNPs 
were ascertained to be independent. To further validate their 
independence, we performed an additional LR analysis using 
only these SNPs. This retained four SNPs independently sig- 
nificant at P < 0.05 (the same SNPs as were selected by the 
backwards model, Table 1). These SNPs highlight clusters 
of highly or moderately correlated variants, with only 
modest LD between these groups of variants, suggesting the 
presence of four separate regions containing PrCa risk variants 
(Fig. 1, Supplementary Material, Fig. S2). 

Region 1 begins within intron 2 and stretches into intron 4 
of TERT and contains our previously reported association 
rs2242652. This variant remained the most strongly associated 
PrCa risk SNP after univariate analysis (?= 1.0 x 10~ 23 ) and 
remained significant in the forward LR model, whereas the 
backward LR model identified a different significant SNP, 
rs7725218, that is only modestly correlated with rs2242652 
(r 2 = 0.40). However, after the multiple regression analysis, 
only rs7725218 remained independently significant 
(Table 1). Several SNPs in this cluster are correlated with 
these variants at r 2 > 0.5, including rsl0069690 that was pre- 
viously reported to be associated with ER -ve breast cancer 
(9), suggesting that the prostate and breast cancer risks may 
be driven by the same variant(s). 

Region 2 is entirely situated within intron 2 of TERT and 
also contains a portion of the TERT promoter CpG island. In 
the single SNP analysis, the most significant SNP is c5- 
1291331 (P = 3.8 x 10" 23 ); however, this is no longer signifi- 
cant at P < 10 4 after adjustment for other SNPs in the 
region. Instead, in the backwards LR model, another SNP is 
identified, rs2853676, and this SNP remained independently 
significant in the final regression model. This SNP has been 
reported to be associated with risk of glioma (11). The most 
studied polymorphism of the TERT region, rs2736100, that 
was reported to be strongly associated with lung cancer and 
testicular cancer and is in a putative regulatory element (6) 
is located within this region, but this SNP is only weakly cor- 
related with the PrCa risk association (r 2 = 0.2) and was not 
significant at P < 10~ 4 after adjustment for other SNPs. 

Region 3 spans from exon 2 into the near promoter of 
TERT. The most strongly associated SNPs in the single SNP 
analysis were rs7712562 (P=3.8 x 10" 23 ) and rs6554754 
(P = 1.1 x 10~ 18 ). In the conditional analysis, however, the 
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Table 1. Results of LR analysis 
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The table shows SNPs that remained significant after forward or backward stepwise LR (Forward LR, Backward LR) analyses of 44 imputed or genotyped SNPs in 
the TERTregion associated at P < 10 5 with PrCa risk in single SNP analysis (Univariate LR). Additional LR analysis of these six SNPs showed that four SNPs 
(bold) remained independently significant at P < 0.05, representing four independent regions. 
a Genotyped SNPs. 



evidence for association was defined by two different SNPs, 
rs2853669 (forward model P = 1.1 1 x 10" 11 ) and 
rs2736107 (backward model P= 1.16 x 10" 19 ). rs2853669 
has been reported previously to be associated with breast 
cancer risk (12). Two other SNPs in this region, rs2736108 
and rs2736109, which are strongly correlated (r 2 = 0.94), 
have been reported to be associated with breast and ovarian 
cancer risk (13); these two SNPs are highly correlated with 
rs2736107 (r 2 = 0.95) that remained as an independent 
signal after multiple LR, whereas rs2853669 did not 
(Table 1). Although this region extends into the coding se- 
quence, the SNPs that best define it according to the models 
are all located immediately in the 5' promoter region, suggest- 
ing that modulation of TERT transcription appears to be the 
most likely mechanism underlying the risk association at 
this region. 

The fourth association signal, rsl3 190087, lies 3.5 kb 5' to 
TERT. This SNP is independently significant in both the 
forward and backward stepwise models and in the final regres- 
sion analysis. Furthermore, it is not correlated with any of the 
other association signals (Table 1 and Supplementary Mater- 
ial, Table S2). 

To explore the existence of specific risk haplotypes within 
the association signals, we selected SNPs correlated at r 2 > 
0.2 with the four 'top' SNPs that had remained significant 
after multiple regression. Haplotypes containing the top SNP 
and with a P-value smaller than that of any single marker 
included in the haplotype analysis are shown in Supplemen- 
tary Material, Table S3. In region 1, the A/A haplotype of 
rs2242652/rs7725218 (both minor alleles) is more significant- 
ly associated with risk than rs7725218 alone (Supplementary 
Material, Table S3b). This suggests that rs2242652 and 
rs7725218 (or markers strongly correlated with them) are 
both related to risk, but combine in a non-multiplicative 
manner, or that there is a single, as yet untested, causal 
variant in region 1 partially correlated with both markers. In 
region 3, the most significant two-marker haplotype 
(rs2736107/rs2735940) is more significant than rs2736107 
alone, again supporting the existing of either two independent 



signals or a partially correlated untested causal variant. 
The haplotype analysis also suggests a possible combined effect 
of SNPs in regions 2 and 3; the T/T haplotype of 
rs28353676/ rs7449190 is more significant than single 
marker effect of rs28353676. 

To investigate whether SNPs in any of these regions were 
associated with TERT gene expression, we performed quanti- 
tative PCR (qPCR) assays on RNA isolated from 195 histolo- 
gically benign prostate tissue samples using the Fluidigm 
Biomark™ HD system. These samples were identified and 
selected from core biopsy specimens from fresh frozen 
radical prostatectomy from men with elevated prostate specific 
antigen (PSA) level (median age 61 years). mRNA samples 
were analysed for TERT and CLPTM1L and normalized to 
housekeeping genes |3-actin and 18S RNA. We found evidence 
that the protective alleles of rsl0054203, rsl0069690, 
rs2242652, rs7725218 and rs7713218 (all in region 1) were sig- 
nificantly associated with increased TERT expression (P = 
0.01-0.0009), but no association was observed for CLPTM1L 
(Fig. 2, Supplementary Material, Table S4). We found no evi- 
dence for association between any of the SNPs significant in 
the univariate analysis in regions 2-4 and TERT expression. 
This provides further evidence that the functional basis of the 
region 1 risk signal differs from that of the other regions. 

DISCUSSION 

Within the TERT locus at 5pl5, we have identified four asso- 
ciation signals that are independently associated with PrCa 
risk after multiple LR analysis (Table 1). Haplotype analyses 
also confirm the existence of four association signals, but iden- 
tify stronger risk haplotypes in three of the four regions, sug- 
gesting either the presence of untyped causal variants in these 
regions or non-multiplicative interactions between two or 
more variants. Three of these risk signals are represented by 
SNPs in localized clusters of moderate LD, whereas the 
fourth appears to be more tightly defined. These association 
regions select variants that are intronic or closely upstream 
for all known transcripts of TERT. Whereas the four SNPs 
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Figure 1. Results of TERT fine-mapping analysis. (A) Regional association plot of the imputed iCOGS genotype data. Typed SNPs are indicated in red and 
imputed SNPs in grey. Diamonds denote SNPs significantly associated with PrCa after multiple LR analyses. The 20 kb interval is denoted by the shaded 
region that is expanded below. Forty-four SNPs were associated with PrCa risk at P < 10 5 (indicated by the red line). (B) Expanded detail for the 20 kb inter- 
val. The positions of 42 SNPs located within this window significant at P < 10 5 are marked (42 SNPs P < 10 5 ), as are the 4 SNPs independently significant 
after multiple LR model and SNPs that overlap with ENCODE annotations (ENCODE intersect), including DNase I hypersensitivity (DNase Clusters) and TFBS 
ChIP signals. The positions of TERT gene transcripts from Ensembl 65 (TERT), CpG island regions (CpG), segmental duplications (SegDup) and ENCODE 
chromatin state (Broad ChromHmm) are also indicated. The light grey rectangle (Broad ChrommHmm) denotes a region of heterochromatin, the yellow rect- 
angle a weak enhancer and the dark grey rectangle a polycomb-repressed region. All tracks were generated using the Hgl9 build of the UCSC genome browser. 
The locations of regions 1-4 are indicated as coloured rectangles and numbered. (C) LD plot for the 20 kb interval, r 2 values are derived from imputed data for 
the UKGPCS subset of iCOGS samples. Triangles indicate the boundaries of regions 1 -4. MLR, multiple LR. 
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Figure 2. mRNA expression levels in benign prostate tissue for three SNPs in region 1 of the TERT locus. A significant increase in TERT expression was 
observed for the minor (protective) alleles of (A) rsl0069690, (C) rs2242652 and (D) rs7725218. No effect on expression of the CLPTM1L gene was observed, 
data are shown only for (B) rsl0069690. 



representing the independently associated signals in the re- 
gression models could be candidate causative variants for 
further analyses, any variants that are correlated with these 
SNPs could potentially confer the functional effects that 
modify disease risk. 

The regulation of TERT has been studied in much detail. 
There are transcription factor binding sites (TFBS) in the 
TERT promoter for several genes that are known to influence 
PrCa development and progression while chromatin remodel- 
ling via acetylation and methylation also appears to play a crit- 
ical role (14,15). This implies that the variants we have 
identified could manifest their effect through modification of 
these elements. We have shown that SNPs in region 1 are 
associated with TERT expression in benign prostate tissue 
(Fig. 2, Supplementary Material, Table S4) providing evidence 
that variants in this region may affect PrCa risk through regu- 
lation of gene expression. 

Our analysis identified four independent association signals 
at the TERT locus; however, the precise functional variants 
that are responsible for altering the risk of PrCa remain to 
be established and could arise through any variants in LD 



with the SNPs we have identified. Comparing our findings 
with functional data from the Encyclopedia of DNA Elements 
(ENCODE) Project (16) [obtained through HaploReg (17) and 
the UCSC genome browser (18)] can help to predict the most 
likely functional SNPs (Fig. 1, Supplementary Material, 
Table S5). In region 1, rs7725218, the SNP that remained sig- 
nificant in the final analysis, is situated within a DNase I 
hypersensitivity region and predicted to alter an Mrg TFBS. 
In addition, rs2242652, which is in moderate LD with 
rs7725218 (r 2 = 0.4), is also situated in a DNase I hypersensi- 
tivity region and predicted to disrupt HEN1, Zfx and E2A 
TFBS consensus sequences. The minor, lower risk alleles of 
both these variants are associated with increased TERT expres- 
sion (Fig. 2) that would be consistent with these SNPs modi- 
fying functional regulatory elements. In addition, another 
SNP rs7734992 also overlaps a DNase I hypersensitivity 
region and is predicted to alter an Mtfl TFBS. Region 3 
encompasses the near promoter region of the TERT gene and 
as expected contains several variants with potential functional 
effects. rs2853669, which was significant in the forward ana- 
lysis only, is located immediately 5' to the TER T transcription 
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start site, within a DNase I hypersensitivity region. ChlP-seq 
data indicate that this SNP is situated within an RNA polymer- 
ase II binding site, whereas histone modification data suggest 
that it lies inside a weak enhancer element. This SNP is also 
predicted to disrupt an RBP-Jkappa TFBS and has previously 
been demonstrated to modify telomerase activity in lung 
cancer cells (19), providing further support for a direct func- 
tional effect arising from this SNP. Another SNP in region 3 
that ENCODE data suggests may exert a functional effect is 
rs2736108. This SNP lies within a DNase I hypersensitivity 
site, and ChlP-seq data indicate that it is within an EBF1 
TFBS. This SNP did not itself remain significant after LR ana- 
lysis; however, it is very highly correlated with rs2736107 (r 2 = 
0.95), the SNP in region 3 that remained significant after the mul- 
tiple regression analysis. Lastly, rs2736098, which is also corre- 
lated with rs2736107 {r" = 0.8), is located within a DNase I 
hypersensitivity region and is predicted to alter TFBS for 
NRSF and LRF. The SNP that defines region 4 according to 
all statistical models, rsl3 190087, has no obvious functional 
effect itself and, however, is correlated with one other variant, 
rs71595003 (r 2 = 0.67). This SNP overlaps a DNase I hypersen- 
sitivity site, and ChlP-seq data also indicate that it overlaps TFBS 
for TCF12 and MAFK, although it is also predicted to disrupt an 
aryl hydrocarbon receptor binding motif. 

In addition to the biological insights provided by the 
ENCODE project, (20) showed that rs7705526 in region 1 
and SNPs in region 3, including rs2736108, are strongly asso- 
ciated with mean TL in lymphocytes. Whereas the correlation 
between the region 1 TL SNP and our PrCa risk SNPs is weak, 
the variants associated with PrCa and TL are strongly corre- 
lated in region 3 (r 2 = 0.94); therefore, it remains possible 
that this region could influence PrCa risk through a 
TL-dependent mechanism. 

Overall, our results demonstrate that four sets of variants 
within a narrow interval at 5pl5 are independently associated 
with PrCa risk and that one of these regions significantly 
affects TERT expression. It has been reported previously that 
elevated TERT expression improves PrCa survival (21), and 
we have demonstrated that the lower risk alleles of variants 
in region 1 are associated with elevated TERT expression, 
thereby suggesting a plausible mechanism by which these var- 
iants could affect disease. Deep re-sequencing of this region 
may provide further insight by helping to uncover additional 
associated variants, further refine these loci and facilitate se- 
lection of prospective causal variants for functional validation 
studies. The phenomenon whereby multiple loci are subse- 
quently identified to explain an initial GWAS association 
signal has also been observed for other PrCa regions such as 
1 1 ql 3 and 8q24 and highlights the value of fine-scale 
mapping of risk associations to fully define their contribution 
to cancer susceptibility. 



MATERIALS AND METHODS 

Samples 

Samples for the iCOGS replication were drawn from 25 
studies participating in the PRACTICAL Consortium. The ma- 
jority of studies were population-based or hospital-based case- 
control studies, or nested case-control studies; some studies 



selected samples by age or oversampled for cases with a 
family history of PrCa. In total, genotype data for 22 301 
PrCa cases and 22 320 matched controls were available after 
QC (10). A subset of 2831 cases and 2440 controls from the 
UKGPCS study were selected for genotyping by Sequenom 
iPlex MassARRAY technology. 

Genotyping of 5pl5 SNPs on the iCOGS chip 

All known SNPs from the March 2010 (Build 36) release 
of the 1000 Genomes Project with minor allele frequency 
>0.02 in Europeans in a 135 kb interval (Chr5: 1227693- 
1361669) encompassing the SLC6A18, TERT and CLPTM1L 
genes were identified. All SNPs correlated at r 2 > 0.1 with a 
published cancer association, plus an additional tagging set 
to cover the remaining known SNPs, were included on the 
array. This generated a panel of 1 14 SNPs that were genotyped 
using a custom Illumina Infinium array (iCOGS). 

Selection and genotyping of further SNPs 

Based on iCOGS data, the SNPs associated with PrCa clus- 
tered within an ~20kb interval (Chr5: 1278590-1299850), 
with no SNPs outside of this region showing evidence for as- 
sociation (Fig. 1). Data from the 1000 genomes project (1000 
Genomes August 2010 dataset called by Broad in Nov 2011 
across 283 European samples) indicated that the PrCa interval 
contained 104 putative SNPs, of which 52 had minor allele 
frequency (MAF) >2%. To fine -map the PrCa susceptibility 
region at high depth, we used the Tagger feature of Haploview 
to design a panel to capture all MAF >2% variants at r~ > 
0.9. These criteria required genotyping of 45 SNPs, 17 of 
which had previously been genotyped on the iCOGS array 
(6 were significant at P < 10~ 6 , a further 3 at P < 10~ 4 and 
the remainder showed no evidence of association). Additional- 
ly, a proxy search using the 1 000 Genomes Pilot 1 CEU panel 
was performed to identify any further SNPs correlated at r 2 > 
0.4 with rs2242652 or any of the iCOGS P < 10" 4 SNPs. This 
added further 6 SNPs to the fine-mapping panel, bringing the 
number of SNPs to be genotyped in addition to the iCOGS 
array to 34. 

Genotyping assays were designed using the Sequenom Mas- 
sARRAY Assay Designer 4.0 software. During the assay 
design process, nine SNPs in RepeatMasked or segmentally 
duplicated regions were unable to be designed and were 
excluded. The remaining 25 SNPs were genotyped using the 
Sequenom MassARRAY iPLEX Platform (Sequenom, San 
Diego, CA, USA), of which 20 passed QC: SNPs were 
excluded, if more than 15% of samples failed. 

All assays were performed in 384-well plates, including a 
mix of cases and controls, with 4 blank samples and 8 random 
duplicates for QC. Duplicate samples were 99.6% concordant. 

Imputation 

Imputation was performed on 22 301 cases and 22 320 control 
samples across 114 iCOGS SNPs from the TERT interval 
that passed pre-imputation QC metrics: missing genotypes 
<3%, MAF >0.01 and Hardy- Weinberg Equilibrium 
among controls P < 10" 6 (10). IMPUTE v2.2.2 (22) was 
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used to impute the interval Chr5: 1227693-1361669 (GRCh37/ 
hgl9) using a 1000 Genomes Phase 1 integrated variant set 
(SNPs and indels) from 5 March 2012, settings in Supplementary 
Material, Figure SI. 

This generated an iCOGS imputed dataset of 1094 SNPs. 
Concordance was checked by two methods; firstly, 5271 
samples were analysed for concordance across the 20 SNPs 
genotyped by Sequenom, but not on the iCOGS chip, with 
concordance of >90%. Secondly, IMPUTE v2.2.2 'leave 
one out' internal concordance check gave 86.3% concordance 
at SNPs r 2 > 0.3 and 90.1% concordance at SNPs r > 0.9 
with the 114 SNPs on the iCOGS chip across all 44 621 
samples (for a full breakdown by r, see Supplementary Mater- 
ial, Table S6). Given the high concordance across both 
methods, we performed imputation using a 1000 Genomes 
variant set alone, without implementing a two panel imputation. 

Statistical analysis 

Association tests were performed on genotypes in the MaCH 
dosage format (0-2) converted from the IMPUTE genotype 
posterior probabilities using GenABEL (23), and haplotype 
analyses were performed on 'best guess' genotypes converted 
using GenGen; calls are generated only, if the posterior prob- 
ability is higher than 0.9, unless otherwise stated. 

Associations between each SNP and PrCa risk were ana- 
lysed using a per-allele trend test, adjusted for study and six 
principal components (10). Odds ratios (ORs) and 95% confi- 
dence limits were estimated using unconditional LR. Tests of 
homogeneity of the ORs across strata were assessed using 
likelihood ratio test. SNPs significant at P < 10~ 5 were con- 
sidered for further analysis. To determine independently asso- 
ciated SNPs, we used forward and backward stepwise LR; 
SNPs were included in the model, if they were significant at 
P < 10~ 4 after adjustment for other SNPs. To further assess 
the independence of these associations, an additional LR ana- 
lysis was performed using the SNPs retained in these models. 

Haplotype analyses (Chi-squared test) were performed 
using Unphased 3.16 (24) using all marker combinations and 
a window size of two. Haplotypes were filtered to select 
only haplotypes containing the top SNP and with a P-value 
smaller than that of any single marker. These haplotypes 
were then rerun in PLINK (25) (LR), to correct for the same 
covariates used in the original association analyses. 

Gene expression analysis 

Tissue sections were obtained from biopsies taken from fresh 
frozen radical prostatectomy samples of 195 European men 
(mean age 61.5 years). Ten to 14 cores from each biopsy 
sample were excised, and the pathology of each core was deter- 
mined based on the H&E staining of the two adjacent sections. 
All patients who underwent surgery had elevated (>3 ng/ml) 
PSA levels (mean PSA 9.52 ng/ml, range 3.4-40 ng/ml). 
qPCR assays were performed using the Fluidigm Biomark™ 
HD system with 48 x 48 and 96 x 96 dynamic array plates 
according to the manufacturer's instructions. TaqMan assays 
for TERT Hs00972656_ml and Hs00972649_ml were tested, 
but only assay Hs00972656_ml worked reliably, so all data 
generated were based on this. Each assay was performed in 



triplicate on each plate, and at least two replicate plates 
were run for each assay. Other TaqMan assays included 
Hs00363947_ml (CLPTM1L), 4319413E (18S RNA) and 
43263 15E (j3-actin). Data for all repeats were normalized to 
housekeeping genes /3-actin and 18S RNA. Multiple 'no tem- 
plate' control samples were included in each reaction plate. 
Data were also normalized across reaction plates through the 
inclusion of three commercially sourced 'control' RNA 
samples across all reaction plates. Clontech qPCR human ref- 
erence total RNA (Clontech, Mountain View, CA, USA, Cat 
No. 636 690), Ambion FirstChoice human brain RNA refer- 
ence (Life Technologies Corporation, Carlsbad, CA, USA, 
Cat No. AM6050) and Applied Biosystems' TaqMan control 
total RNA (human) (Life Technologies Corporation, Carlsbad, 
CA, USA, Cat No. 4 307 281) also acted as positive controls 
for target gene expression. In addition, 1000 permutation tests 
were performed on the available data. Hits with Kruskal- 
Wallis P < 0.05 were considered significant. 

URLS 

http://ec.europa.eu/research/health/medical-research/cancer/fp 
7-proj ects/ cogs_en.html 

http://ccge.medschl.cam.ac.uk/consortia/practical 

http://pngu.mgh.harvard.edu/purcell/plink/ 

http : //mathgen . stats . ox . ac . uk/impute/ impute_v2 .html 

http://www.openbioinformatics.org/gengen/index.html 

http://genome.ucsc.edu/ 

http://www.broadinstitute.org/mammals/haploreg/haploreg.php 

SUPPLEMENTARY MATERIAL 

Supplementary Material is available at HMG online. 
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